Automatically Finding Good Clusters with Seed K-Means
نویسندگان
چکیده
In finding biologically relevant groups of genes with gene expression data obtained by microarray technologies, the k-means clustering method is one of the most popular approaches due to its easiness to use and simplicity to implement. However, the randomness of k-means clustering method in choosing initial points to start with makes it impossible to obtain reliable results without much iteration of the entire clustering process [2]. Our goal here is to introduce a novel clustering method, which we call it seed k-means clustering, where a novel algorithm is employed to automatically find good initial seeds for k-means clustering.
منابع مشابه
Validation of k-means and Threshold based Clustering Method
Data mining isa process of extracting interested hidden information from large databases. It can be applied on many databases but kind of patterns to be found is specified by various data mining techniques.Clustering is one of the data mining techniques that partitions database into clusters such that data objects in same clusters are similar and data objects belonging to different cluster are ...
متن کاملRobust partitional clustering by outlier and density insensitive seeding
The leading partitional clustering technique, k-means, is one of the most computationally efficient clustering methods. However, it produces a local optimal solution that strongly depends on its initial seeds. Bad initial seeds can also cause the splitting or merging of natural clusters even if the clusters are well separated. In this paper, we propose, ROBIN, a novel method for initial seed se...
متن کاملA hybrid clustering technique combining a novel genetic algorithm with K-Means
Many existing clustering techniques including K-Means require a user input on the number of clusters. It is often extremely difficult for a user to accurately estimate the number of clusters in a data set. The genetic algorithms (GAs) generally determine the number of clusters automatically. However, they typically choose the genes and the number of genes randomly. If we can identify the right ...
متن کاملValidity Measure of Cluster Based On the Intra-Cluster and Inter-Cluster Distance
The k-means method has been shown to be effective in producing good clustering results for many practical applications. However, a direct algorithm of k-means method requires time proportional to the product of number of patterns and number of clusters per iteration. This is computationally very expensive especially for large datasets. The main disadvantage of the k-means algorithm is that the ...
متن کاملModEx and Seed-Detective: Two novel techniques for high quality clustering by using good initial seeds in K-Means
Clustering; Classification; K-Means; Cluster evaluation; Data mining Abstract In this paper we present two clustering techniques called ModEx and Seed-Detective. ModEx is a modified version of an existing clustering technique called Ex-Detective. It addresses some limitations of Ex-Detective. Seed-Detective is a combination of ModEx and Simple KMeans. Seed-Detective uses ModEx to produce a set ...
متن کامل